Round-trip Testing by Nikil-Shyamsunder · Pull Request #187 · cucapra/protocols

Nikil-Shyamsunder · 2026-02-15T22:16:46Z

This PR focuses on infrastructure, not semantic bug fixes.

We now integrate roundtrip checks with Turnt instead of running only a standalone script.

Added a Turnt roundtrip environment for roundtrip
Roundtrip command runs per-.tx file via: scripts/roundtrip_case.py {filename}
Added a convenience entry in justfile to run roundtrip through Turnt.
CI now runs roundtrip tests

For each .tx file:

Parse // ARGS and // RETURN metadata.
Skip tests with non-zero // RETURN (recorded as skip).
For each trace in the .tx file, run interpreter to generate waveform
Run monitor on generated waveform using the same protocol context.
Compare expected trace block from .tx against monitor-emitted trace block(s) after normalization (get rid of comments, whitespace etc).
Emit success/skipped/failure messages

Turnt snapshots roundtrip stdout into .rt files

Updated the turnt output to also have the expected (monitor) traces and the actual (interpreter) trace. so for a succeeding test, it might look like:

trace_block: 0
trace_result: PASS
matched_monitor_trace_index: 0
interpreter_trace:
trace {
    multiple_assign(1, 1);
    multiple_assign(1, 1);
}

parsed_monitor_trace_candidates:
candidate_monitor_trace: 0
trace {
    multiple_assign(1, 1);
    multiple_assign(1, 1);
}

candidate_monitor_trace: 1
trace {
    multiple_assign(1, 1);
    two_fork_err(1, 1);
}

This is good for passing examples, because we can check if there is any change in interpreter or monitor behavior that changes how many or what traces show up even for passing tests, instead of a blanket "pass".

A roundtrip trace might fail like this due to a monitor error:

trace_block: 0
trace_result: FAIL
failure_kind: monitor_error
interpreter_trace:
trace {
    add(1, 2, 3);
    add(4, 5, 9);
}

monitor_error:
All schedulers failed: No transactions match the waveform for DUT `Adder`
Failure at cycle 1: No transactions match the waveform in `.roundtrip_tmp/adders-adder_d1-both_threads_pass_0.fst`.
Possible transactions: [add, add_fork_early, add_doesnt_end_in_step, add_incorrect, add_incorrect_implicit, wait_and_add]
Error: Monitor failed

Or it might fail where the actual trace is not in the monitor output, like this:

trace_block: 0
trace_result: FAIL
failure_kind: trace_mismatch
interpreter_trace:
trace {
    reset();
    push(2);
    pop(2);
    idle();
    push(3);
    pop(3);
}

parsed_monitor_trace_candidates:
candidate_monitor_trace: 0
trace {
    reset();
    push(2);
    pop(2);
}

monitor failures due to panics are sanitized to get rid of machine-specific thread numbers (observed this on the GitHub actions CI machine) and because the line numbers in the panicked at line will change constantly.

we might also consider putting in a low-priority issue that some of this can be ported into rust (I am thinking specifically of the logic for generating a trace, passing it to the monitor, comparing the resulting traces, etc.) so that we have native support instead of relying on janky string processing.

"

ngernest · 2026-02-16T00:29:04Z

Looks good to me, thanks for working on this! I just updated the turnt.toml file to use uv run instead of python3 to invoke the round-tripping script (to maintain consistency with the other Python scripts in the repo).

I wonder if we should try to resolve all the failing round-trip tests in this PR or save them for another PR -- thoughts?

scripts/roundtrip_case.py

Nikil-Shyamsunder · 2026-02-16T01:27:54Z

I wonder if we should try to resolve all the failing round-trip tests in this PR or save them for another PR -- thoughts?

I think there are too many disparate non-trivial fixes to have them all in one PR.

ngernest · 2026-02-16T23:56:14Z

Gotcha, I think Kevin @ekiwi normally prefers for all CI checks to be passing before merging a PR, I'll defer to him re: whether we can leave some round-trip tests failing when we merge this PR (i.e. fix them in a separate future PR).

ngernest · 2026-02-17T02:31:34Z

This looks really good! Thanks for updating this so quickly! I'm happy with this

Nikil-Shyamsunder · 2026-02-17T02:33:32Z

Gotcha, I think Kevin @ekiwi normally prefers for all CI checks to be passing before merging a PR, I'll defer to him re: whether we can leave some round-trip tests failing when we merge this PR (i.e. fix them in a separate future PR).

Agreed. the check will "pass" now but it will be passing with some examples having trace_result: FAIL that we can either resolve before merging this or resolve later. Agree to defer to kevin. Thanks so much for the great feedback on this review

…chine and local

ngernest · 2026-02-18T19:18:40Z

protocols/tests/adders/adder_d1/both_threads_pass.rt

TODO for Ernest: check why the monitor is failing here

ngernest · 2026-02-18T19:19:37Z

protocols/tests/adders/adder_d1/wait_and_add_correct.rt

TODO for Ernest: check why monitor is failing here

ngernest · 2026-02-18T19:21:35Z

protocols/tests/adders/adder_d0/add_combinational.rt

TODO for Ernest: investigate this (b ought to be defined)

ngernest · 2026-02-18T19:27:32Z

protocols/tests/identities/identity_d0/passthrough_combdep.rt

TODO for Ernest: this is probably a off-by-one error (check the waveform manually, it should have length 2)

If waveform has length 3, there's a bug w/ interpreter

If waveform has length 2, there's a bug w/ monitor

ngernest · 2026-02-18T19:38:09Z

Context for @Nikil-Shyamsunder: Kevin and I discussed the failing round-trip tests today in-person, I left comments above on the ones that I need to investigate further (to see if they're bugs with the monitor). Kevin says we should wait till these failing tests are fixed before merging this PR if that's OK!

Nikil-Shyamsunder · 2026-02-18T19:47:11Z

sounds like a plan!

…FST file

…peat loops as allowed-to-fail for RT

…ps, apply to all tests with repeat loops

ngernest · 2026-02-23T19:40:44Z

Added a new CLI flag --allow-round-trip-failure to roundtrip_case.py, which allows the user to mark an individual .tx file as allowed to fail in round-trip mode.

I added this flag to all the .tx files which call protocols that involve repeat ... loops, since those aren't supported in the monitor for now. (I will continue to fix the monitor for the other failing round-trip tests.)

Note: because of how Turnt's overrides work (the same .tx file is used as an input for both the interp and roundtrip environments in Turnt), I also needed to add --allow-round-trip-failure as a "dummy" CLI argument to the Rust interpreter executable (which just ignores this flag).

integrate with turnt

b6a4906

Nikil-Shyamsunder requested a review from ngernest February 15, 2026 22:29

Nikil-Shyamsunder and others added 2 commits February 15, 2026 14:36

make roundtrip a separate job

6d5fbe1

Use uv run instead of python3 when invoking round-tripping script"

6b03d04

"

ngernest approved these changes Feb 16, 2026

View reviewed changes

scripts/roundtrip_case.py Show resolved Hide resolved

scripts/roundtrip_case.py Show resolved Hide resolved

add docstrings

bff8547

Nikil-Shyamsunder added 4 commits February 16, 2026 18:13

add trace outputs for both failing and passing cases, remove ci skip

841b5e6

updated roundtrip case

766cb0d

make the .rt file formatting prettier

3144053

modify python script to have prettier .rt outputs

358df9c

Nikil-Shyamsunder added 2 commits February 16, 2026 18:39

debug: add --diff flag due to divergence in .rt results between ci ma…

f996b8f

…chine and local

normalize panic outputs

68a5a2e

Nikil-Shyamsunder requested a review from ekiwi February 17, 2026 20:16

ngernest mentioned this pull request Feb 18, 2026

[Monitor] Create CLI flag to ignore #[idle] pragma (always print idle() transactions) #189

Closed

ngernest reviewed Feb 18, 2026

View reviewed changes

protocols/tests/adders/adder_d1/both_threads_pass.rt

Copy link

Contributor

ngernest Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for Ernest: check why the monitor is failing here

ngernest reviewed Feb 18, 2026

View reviewed changes

protocols/tests/adders/adder_d1/wait_and_add_correct.rt

Copy link

Contributor

ngernest Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for Ernest: check why monitor is failing here

ngernest reviewed Feb 18, 2026

View reviewed changes

protocols/tests/adders/adder_d0/add_combinational.rt

Copy link

Contributor

ngernest Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for Ernest: investigate this (b ought to be defined)

ngernest reviewed Feb 18, 2026

View reviewed changes

ngernest added 4 commits February 18, 2026 14:56

[skip ci] Add TODO for Ernest

1ec0679

Merge remote-tracking branch 'origin/main' into roundtrip-test-turnt

0a778f7

Add extra CLI flag to round-trip testing script to keep intermediate …

d947ec2

…FST file

Merge remote-tracking branch 'origin/main' into roundtrip-test-turnt

bd12199

ngernest mentioned this pull request Feb 23, 2026

[Interp] Produce non-empty waveforms for combinational DUTs #196

Open

ngernest added 4 commits February 23, 2026 14:18

Add allowed-to-fail CLI flag to round-tripping script

5c7b979

Rename CLI flag to --allow-round-trip-failure, mark all tests with re…

ef6e46e

…peat loops as allowed-to-fail for RT

Bunch of fixes to allow individual tests to be skipped from round-tri…

48fab65

…ps, apply to all tests with repeat loops

Formatting

2a4a1a4

ngernest added 2 commits February 23, 2026 14:43

Only print name of fst file if user explicitly specifies it

aa039ed

Update another .rt file

374eced

Comments

Conversation

Nikil-Shyamsunder commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngernest commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Nikil-Shyamsunder commented Feb 16, 2026

Uh oh!

ngernest commented Feb 16, 2026

Uh oh!

ngernest commented Feb 17, 2026

Uh oh!

Nikil-Shyamsunder commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngernest Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

ngernest Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

ngernest Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

ngernest Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

ngernest Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

ngernest commented Feb 18, 2026

Uh oh!

Nikil-Shyamsunder commented Feb 18, 2026

Uh oh!

ngernest commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Nikil-Shyamsunder commented Feb 15, 2026 •

edited

Loading

ngernest commented Feb 16, 2026 •

edited

Loading

Nikil-Shyamsunder commented Feb 17, 2026 •

edited

Loading

ngernest commented Feb 23, 2026 •

edited

Loading